Skip to content

Fix single-event chunks missing when using bulk export API#62138

Merged
kshi36 merged 2 commits intomasterfrom
kevin/export-api-missing-events
Dec 10, 2025
Merged

Fix single-event chunks missing when using bulk export API#62138
kshi36 merged 2 commits intomasterfrom
kevin/export-api-missing-events

Conversation

@kshi36
Copy link
Copy Markdown
Contributor

@kshi36 kshi36 commented Dec 10, 2025

Fixes #61729

When using the bulk export API, the event handler can miss certain events that are processed in single-event chunks. This includes access_request.create and session.upload audit events. With this fix, the export API now correctly processes single-event chunks so the event handler can see them.

Big thanks to Forrest and Hugo for the detailed explanations and solution!

Manual Tests

Test: Event handler processes and forwards single-chunk events using bulk export API

  • When connected to a storage backend supporting bulk export API (eg. Teleport Cloud, Athena backend):
    • Verify when an access request is created on Web UI, access_request.create event is forwarded to fluentd audit events endpoint (test.log)
    • Verify when a session is created on Web UI, session.upload event is forwarded to fluentd audit events endpoint, and all session events (eg. session.start, resize, session.end) are forwarded to fluentd session events endpoint (session.*.log)

changelog: Fixed an Auth Service bug causing the event handler to miss up to 1 event every 5 minutes when storing audit events in S3

@kshi36 kshi36 marked this pull request as ready for review December 10, 2025 19:41
@github-actions github-actions Bot added audit-log Issues related to Teleports Audit Log size/md labels Dec 10, 2025
@github-actions github-actions Bot requested review from greedy52 and kiosion December 10, 2025 19:41
Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix 👍

Could you edit the changelog to specify the impact of the bug (how many events), and which component they should update? Also I think we don't need to specify the event type as this bug affects all events.

e.g.

Changelog: Fixed an Auth Service bug causing the event-handler to miss up to 1 event every 5 minutes when storing audit events in S3.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the test without the fix and it does fail :)

=== RUN   Test_querier_streamEventsFromChunk
    querier_test.go:1043: 
        	Error Trace:	/Users/shaka/go/src/github.com/gravitational/teleport/lib/events/athena/querier_test.go:1043
        	Error:      	"[]" should have 1 item(s), but has 0
        	Test:       	Test_querier_streamEventsFromChunk

@kshi36
Copy link
Copy Markdown
Contributor Author

kshi36 commented Dec 10, 2025

impact of the bug (how many events)

Is the number of events missed timeframe (1 event every 5 minutes) correlated to the uploaded chunk containing only 1 event? If not, would the changelog be misleading?

@hugoShaka
Copy link
Copy Markdown
Contributor

hugoShaka commented Dec 10, 2025

impact of the bug (how many events)

Is the number of events missed timeframe (1 event every 5 minutes) correlated to the uploaded chunk containing only 1 event? If not, would the changelog be misleading?

The single-event chunk is the easiest case to reproduce the bug, but the reader seems to read the events 2 by 2. So if you have 11 events in a finished chunk, you would read 5 times 2 events, and the last read would return (1, EOF). So the last event will be dropped if len(chunk) %2 == 1.

Assuming the parity of the number of events in 5 min is random, we are losing 1 event every 10 min, with a worst case scenario at 1 event every 5 min.

@kshi36 kshi36 added this pull request to the merge queue Dec 10, 2025
Merged via the queue into master with commit 053e12e Dec 10, 2025
50 checks passed
@kshi36 kshi36 deleted the kevin/export-api-missing-events branch December 10, 2025 21:47
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@kshi36 See the table below for backport results.

Branch Result
branch/v17 Create PR
branch/v18 Create PR

21KennethTran pushed a commit that referenced this pull request Jan 6, 2026
* Fix single-event chunks missing when using bulk export API

* Fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

single-event chunks missing when using Teleport Event Handler with s3 storage

3 participants